110 research outputs found

    Correction of sequencing errors in a mixed set of reads

    Get PDF
    Non Peer reviewe

    The spindle assembly checkpoint as a drug target - Novel small-molecule inhibitors of Aurora kinases

    Get PDF
    Cell division (mitosis) is a fundamental process in the life cycle of a cell. Equal distribution of chromosomes between the daughter cells is essential for the viability and well-being of an organism: loss of fidelity of cell division is a contributing factor in human cancer and also gives rise to miscarriages and genetic birth defects. For maintaining the proper chromosome number, a cell must carefully monitor cell division in order to detect and correct mistakes before they are translated into chromosomal imbalance. For this purpose an evolutionarily conserved mechanism termed the spindle assembly checkpoint (SAC) has evolved. The SAC comprises a complex network of proteins that relay and amplify mitosis-regulating signals created by assemblages called kinetochores (KTs). Importantly, minor defects in SAC signaling can cause loss or gain of individual chromosomes (aneuploidy) which promotes tumorigenesis while complete failure of SAC results in cell death. The latter event has raised interest in discovery of low molecular weight (LMW) compounds targeting the SAC that could be developed into new anti-cancer therapeutics. In this study, we performed a cell-based, phenotypic high-throughput screen (HTS) to identify novel LMW compounds that inhibit SAC function and result in loss of cancer cell viability. Altogether, we screened 65 000 compounds and identified eight that forced the cells prematurely out of mitosis. The flavonoids fisetin and eupatorin, as well as the synthetic compounds termed SACi2 and SACi4, were characterized in more detail utilizing versatile cell-based and biochemical assays. To identify the molecular targets of these SAC-suppressing compounds, we investigated the conditions in which SAC activity became abrogated. Eupatorin, SACi2 and SACi4 preferentially abolished the tensionsensitive arm of the SAC, whereas fisetin lowered also the SAC activity evoked by lack of attachments between microtubules (MTs) and KTs. Consistent with the abrogation of SAC in response to low tension, our data indicate that all four compounds inhibited the activity of Aurora B kinase. This essential mitotic protein is required for correction of erratic MT-KT attachments, normal SAC signaling and execution of cytokinesis. Furthermore, eupatorin, SACi2 and SACi4 also inhibited Aurora A kinase that controls the centrosome maturation and separation and formation of the mitotic spindle apparatus. In line with the established profound mitotic roles of Aurora kinases, these small compounds perturbed SAC function, caused spindle abnormalities, such as multi- and monopolarity and fragmentation of centrosomes, and resulted in polyploidy due to defects in cytokinesis. Moreover, the compounds dramatically reduced viability of cancer cells. Taken together, using a cell-based HTS we were able to identify new LMW compounds targeting the SAC. We demonstrated for the first time a novel function for flavonoids as cellular inhibitors of Aurora kinases. Collectively, our data support the concept that loss of mitotic fidelity due to a non-functional SAC can reduce the viability of cancer cells, a phenomenon that may possess therapeutic value and fuel development of new anti-cancer drugs.Siirretty Doriast

    HGGA : hierarchical guided genome assembler

    Get PDF
    Background De novo genome assembly typically produces a set of contigs instead of the complete genome. Thus additional data such as genetic linkage maps, optical maps, or Hi-C data is needed to resolve the complete structure of the genome. Most of the previous work uses the additional data to order and orient contigs. Results Here we introduce a framework to guide genome assembly with additional data. Our approach is based on clustering the reads, such that each read in each cluster originates from nearby positions in the genome according to the additional data. These sets are then assembled independently and the resulting contigs are further assembled in a hierarchical manner. We implemented our approach for genetic linkage maps in a tool called HGGA. Conclusions Our experiments on simulated and real Pacific Biosciences long reads and genetic linkage maps show that HGGA produces a more contiguous assembly with less contigs and from 1.2 to 9.8 times higher NGA50 or N50 than a plain assembly of the reads and 1.03 to 6.5 times higher NGA50 or N50 than a previous approach integrating genetic linkage maps with contig assembly. Furthermore, also the correctness of the assembly remains similar or improves as compared to an assembly using only the read data.Peer reviewe

    Simple Runs-Bounded FM-Index Designs Are Fast

    Get PDF
    Given a string X of length n on alphabet ?, the FM-index data structure allows counting all occurrences of a pattern P of length m in O(m) time via an algorithm called backward search. An important difficulty when searching with an FM-index is to support queries on L, the Burrows-Wheeler transform of X, while L is in compressed form. This problem has been the subject of intense research for 25 years now. Run-length encoding of L is an effective way to reduce index size, in particular when the data being indexed is highly-repetitive, which is the case in many types of modern data, including those arising from versioned document collections and in pangenomics. This paper takes a back-to-basics look at supporting backward search in FM-indexes, exploring and engineering two simple designs. The first divides the BWT string into blocks containing b symbols each and then run-length compresses each block separately, possibly introducing new runs (compared to applying run-length encoding once, to the whole string). Each block stores counts of each symbol that occurs before the block. This method supports the operation rank_c(L, i) (i.e., count the number of times c occurs in the prefix L[1..i]) by first determining the block i/b in which i falls and scanning the block to the appropriate position counting occurrences of c along the way. This partial answer to rank_c(L, i) is then added to the stored count of c symbols before the block to determine the final answer. Our second design has a similar structure, but instead divides the run-length-encoded version of L into blocks containing an equal number of runs. The trick then is to determine the block in which a query falls, which is achieved via a predecessor query over the block starting positions. We show via extensive experiments on a wide range of repetitive text collections that these FM-indexes are not only easy to implement, but also fast and space efficient in practice

    Computing all-vs-all MEMs in grammar-compressed text

    Full text link
    We describe a compression-aware method to compute all-vs-all maximal exact matches (MEM) among strings of a repetitive collection T\mathcal{T}. The key concept in our work is the construction of a fully-balanced grammar G\mathcal{G} from T\mathcal{T} that meets a property that we call \emph{fix-free}: the expansions of the nonterminals that have the same height in the parse tree form a fix-free set (i.e., prefix-free and suffix-free). The fix-free property allows us to compute the MEMs of T\mathcal{T} incrementally over G\mathcal{G} using a standard suffix-tree-based MEM algorithm, which runs on a subset of grammar rules at a time and does not decompress nonterminals. By modifying the locally-consistent grammar of Christiansen et al 2020., we show how we can build G\mathcal{G} from T\mathcal{T} in linear time and space. We also demonstrate that our MEM algorithm runs on top of G\mathcal{G} in O(G+occ)O(G +occ) time and uses O(logG(G+occ))O(\log G(G+occ)) bits, where GG is the grammar size, and occocc is the number of MEMs in T\mathcal{T}. In the conclusions, we discuss how our idea can be modified to implement approximate pattern matching in compressed space

    Safely Filling Gaps with Partial Solutions Common to All Solutions

    Get PDF
    Gap filling has emerged as a natural sub-problem of many de novo genome assembly projects. The gap filling problem generally asks for an s-t path in an assembly graph whose length matches the gap length estimate. Several methods have addressed it, but only few have focused on strategies for dealing with multiple gap filling solutions and for guaranteeing reliable results. Such strategies include reporting only unique solutions, or exhaustively enumerating all filling solutions and heuristically creating their consensus. Our main contribution is a new method for reliable gap filling: filling gaps with those sub-paths common to all gap filling solutions. We call these partial solutions safe, following the framework of (Tomescu and Medvedev, RECOMB 2016). We give an efficient safe algorithm running in O(dm) time and space, where d is the gap length estimate and m is the number of edges of the assembly graph. To show the benefits of this method, we implemented this algorithm for the problem of filling gaps in scaffolds. Our experimental results on bacterial and on conservative human assemblies show that, on average, our method can retrieve over 73 percent more safe and correct bases as compared to previous methods, with a similar precision.Peer reviewe

    Improved algorithms for string searching problems

    Get PDF
    We present improved practically efficient algorithms for several string searching problems, where we search for a short string called the pattern in a longer string called the text. We are mainly interested in the online problem, where the text is not preprocessed, but we also present a light indexing approach to speed up exact searching of a single pattern. The new algorithms can be applied e.g. to many problems in bioinformatics and other content scanning and filtering problems. In addition to exact string matching, we develop algorithms for several other variations of the string matching problem. We study algorithms for approximate string matching, where a limited number of errors is allowed in the occurrences of the pattern, and parameterized string matching, where a substring of the text matches the pattern if the characters of the substring can be renamed in such a way that the renamed substring matches the pattern exactly. We also consider searching multiple patterns simultaneously and searching weighted patterns, where the weight of a character at a given position reflects the probability of that character occurring at that position. Many of the new algorithms use the backward matching principle, where the characters of the text that are aligned with the pattern are read backward, i.e. from right to left. Another common characteristic of the new algorithms is the use of q-grams, i.e. q consecutive characters are handled as a single character. Many of the new algorithms are bit parallel, i.e. they pack several variables to a single computer word and update all these variables with a single instruction. We show that the q-gram backward string matching algorithms that solve the exact, approximate, or multiple string matching problems are optimal on average. We also show that the q-gram backward string matching algorithm for the parameterized string matching problem is sublinear on average for a class of moderately repetitive patterns. All the presented algorithms are also shown to be fast in practice when compared to earlier algorithms. We also propose an alphabet sampling technique to speed up exact string matching. We choose a subset of the alphabet and select the corresponding subsequence of the text. String matching is then performed on this reduced subsequence and the found matches are verified in the original text. We show how to choose the sampled alphabet optimally and show that the technique speeds up string matching especially for moderate to long patterns
    corecore